{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 10.6 LightGBM算法案例实战2 - 广告收益回归预测模型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**10.6.1 案例背景**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "投资商经常会通过多个不同渠道投放广告，以此来获得经济利益。在本案例中我们选取公司在电视、广播和报纸上的投入，来预测广告收益，这对公司策略的制定是有较重要的意义。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**10.6.2 模型搭建**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-11-21T02:49:42.190687Z",
     "start_time": "2020-11-21T02:49:41.023502Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>电视</th>\n",
       "      <th>广播</th>\n",
       "      <th>报纸</th>\n",
       "      <th>收益</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>230.1</td>\n",
       "      <td>37.8</td>\n",
       "      <td>69.2</td>\n",
       "      <td>331.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>44.5</td>\n",
       "      <td>39.3</td>\n",
       "      <td>45.1</td>\n",
       "      <td>156.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>17.2</td>\n",
       "      <td>45.9</td>\n",
       "      <td>69.3</td>\n",
       "      <td>139.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>151.5</td>\n",
       "      <td>41.3</td>\n",
       "      <td>58.5</td>\n",
       "      <td>277.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>180.8</td>\n",
       "      <td>10.8</td>\n",
       "      <td>58.4</td>\n",
       "      <td>193.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      电视    广播    报纸     收益\n",
       "0  230.1  37.8  69.2  331.5\n",
       "1   44.5  39.3  45.1  156.0\n",
       "2   17.2  45.9  69.3  139.5\n",
       "3  151.5  41.3  58.5  277.5\n",
       "4  180.8  10.8  58.4  193.5"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 读取数据\n",
    "import pandas as pd\n",
    "df = pd.read_excel('广告收益数据.xlsx')\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-11-21T02:49:46.050768Z",
     "start_time": "2020-11-21T02:49:44.415960Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LGBMRegressor()"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 1.提取特征变量和目标变量\n",
    "X = df.drop(columns='收益') \n",
    "y = df['收益']  \n",
    "\n",
    "# 2.划分训练集和测试集\n",
    "from sklearn.model_selection import train_test_split\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)\n",
    "\n",
    "# 3. 模型训练和搭建\n",
    "from lightgbm import LGBMRegressor\n",
    "model = LGBMRegressor()\n",
    "model.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**10.6.3 模型预测及评估**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-11-21T02:49:48.659185Z",
     "start_time": "2020-11-21T02:49:48.647217Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([192.6139063 , 295.11999665, 179.92649365, 293.45888909,\n",
       "       166.86159398])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 预测测试数据\n",
    "y_pred = model.predict(X_test)\n",
    "y_pred[0:5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-11-11T09:34:54.835127Z",
     "start_time": "2020-11-11T09:34:54.823159Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>预测值</th>\n",
       "      <th>实际值</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>192.613906</td>\n",
       "      <td>190.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>295.119997</td>\n",
       "      <td>292.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>179.926494</td>\n",
       "      <td>171.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>293.458889</td>\n",
       "      <td>324.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>166.861594</td>\n",
       "      <td>144.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          预测值    实际值\n",
       "0  192.613906  190.5\n",
       "1  295.119997  292.5\n",
       "2  179.926494  171.0\n",
       "3  293.458889  324.0\n",
       "4  166.861594  144.0"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 预测值和实际值对比\n",
    "a = pd.DataFrame()  # 创建一个空DataFrame \n",
    "a['预测值'] = list(y_pred)\n",
    "a['实际值'] = list(y_test)\n",
    "a.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-11-11T09:34:56.853027Z",
     "start_time": "2020-11-11T09:34:56.846047Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([140.00420258])"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 手动输入数据进行预测\n",
    "X = [[71, 11, 2]]\n",
    "model.predict(X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-11-11T09:34:57.559412Z",
     "start_time": "2020-11-11T09:34:57.550437Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9570203214455993"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查看R-square\n",
    "from sklearn.metrics import r2_score\n",
    "r2 = r2_score(y_test, model.predict(X_test))\n",
    "r2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9570203214455993"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查看评分\n",
    "model.score(X_test, y_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-11-11T09:34:59.277868Z",
     "start_time": "2020-11-11T09:34:59.273878Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 950, 1049,  963])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 特征重要性\n",
    "model.feature_importances_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**10.6.4 模型参数调优**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-11-11T09:35:03.514918Z",
     "start_time": "2020-11-11T09:35:03.510901Z"
    }
   },
   "outputs": [],
   "source": [
    "# 参数调优\n",
    "from sklearn.model_selection import GridSearchCV  # 网格搜索合适的超参数\n",
    "parameters = {'num_leaves': [15, 31, 62], 'n_estimators': [20, 30, 50, 70], 'learning_rate': [0.1, 0.2, 0.3, 0.4]}  # 指定分类器中参数的范围\n",
    "model = LGBMRegressor()  # 构建模型\n",
    "grid_search = GridSearchCV(model, parameters,scoring='r2',cv=5) # cv=5表示交叉验证5次，scoring='r2'表示以R-squared作为模型评价准则"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-11-11T09:35:10.359109Z",
     "start_time": "2020-11-11T09:35:04.216021Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'learning_rate': 0.3, 'n_estimators': 50, 'num_leaves': 31}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 输出参数最优值\n",
    "grid_search.fit(X_train, y_train)  # 传入数据\n",
    "grid_search.best_params_  # 输出参数的最优值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9558624845475153"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 重新搭建LightGBM回归模型\n",
    "model = LGBMRegressor(num_leaves=31, n_estimators=50,learning_rate=0.3)\n",
    "model.fit(X_train, y_train)\n",
    "\n",
    "# 查看得分\n",
    "model.score(X_test, y_test)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
